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Abstract 

Adaptive networks consist of a collection of nodes with adaptation and learning abilities. The nodes 
interact with each other on a local level and diffuse information across the network to solve estimation 
and inference tasks in a distributed manner. In this work, we compare the mean-square performance of 
two main strategies for distributed estimation over networks: consensus strategies and diffusion strategies. 
The analysis in the paper confirms that under constant step-sizes, diffusion strategies allow information 
to diffuse more thoroughly through the network and this property has a favorable effect on the evolution 
of the network: diffusion networks are shown to converge faster and reach lower mean-square deviation 
than consensus networks, and their mean-square stability is insensitive to the choice of the combination 
weights. In contrast, and surprisingly, it is shown that consensus networks can become unstable even if 
all the individual nodes are stable and able to solve the estimation task on their own. When this occurs, 
cooperation over the network leads to a catastrophic failure of the estimation task. This phenomenon 
does not occur for diffusion networks: we show that stability of the individual nodes always ensures 
stability of the diffusion network irrespective of the combination topology. Simulation results support 
the theoretical findings. 
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I. Introduction 

Adaptive networks consist of a collection of spatially distributed nodes that are linked together through 
a topology and that cooperate with each other through local interactions. Adaptive networks are well- 
suited to perform decentralized information processing and inference tasks Q, and to model complex 
and self-organized behavior encountered in biological systems (H, Q. 

We examine two types of fully decentralized strategies, namely, consensus strategies and diffusion 
strategies. The consensus strategy was originally proposed in the statistics literature ||6] and has since 
then been developed into an elegant procedure to enforce agreement among cooperating nodes. Average 
consensus and gossip algorithms have been studied extensively in recent years, especially in the control 
literature lITl- llTZl . and applied to the study of multi-agent formations |[T3l , |[T4l . distributed optimization 
lfl31l . lfl6l . and distributed estimation problems lfT71 - l|T9l . Original implementations of the consensus 
strategy relied on the use of two time-scales ||20l - ||22l : one time-scale for the collection of measurements 
across the nodes and another time-scale to iterate sufficiently enough over the collected data to attain 
agreement before the process is repeated. Unfortunately, two time-scale implementations hinder the ability 
to perform real-time recursive estimation and adaptation when measurement data keep streaming in. For 
this reason, in this work, we focus instead on consensus implementations that operate in a single time- 
scale. Such implementations appear in several recent works, including |[T6l - |fT9l , and are largely motivated 
by the procedure developed earlier in lH5l , ||23l for the solution of distributed optimization problems. 

The second class of algorithms that we consider deals with diffusion strategies, which were originally 
introduced for the solution of distributed estimation and adaptation problems in [|2], J3], Il24l - ll26l . The 
main motivation for the introduction of diffusion strategies in these works was the desire to develop 
distributed schemes that are able to respond in real-time to continuous streaming of data at the nodes by 
operating over a single time-scale. A useful overview of diffusion strategies appears in ||27| . Since their 
inception, diffusion strategies have been applied to model various forms of complex behavior encountered 
in nature |@], 0; they have also been adopted to solve distributed optimization problems advantageously 
in Il28l - lr30lk and have been studied under varied conditions in lPTl - ll34l as well. Diffusion strategies 
are inherently single time-scale implementations and are therefore naturally amenable to real-time and 
recursive implementations. It turns out that the dynamics of the consensus and diffusion strategies differ 
in important ways, which in turn impact the mean-square behavior of the respective networks in a 
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fundamental manner. 

The analysis in this paper will confirm that under constant step-sizes, diffusion strategies allow informa- 
tion to diffuse more thoroughly through networks and this property has a favorable effect on the evolution 
of the network. It will be shown that diffusion networks converge faster and reach lower mean-square 
deviation than consensus networks, and their mean-square stability is insensitive to the choice of the 
combination weights. In comparison, and surprisingly, it is shown that consensus networks can become 
unstable even if all the individual nodes are stable and able to solve estimation task on their own. In other 
words, the learning curve of a cooperative consensus network can diverge even if the learning curves for 
the non-cooperative individual nodes converge. When this occurs, cooperation over the network leads to 
a catastrophic failure of the estimation task. This behavior does not occur for diffusion networks: we 
will show that stability of the individual nodes is sufficient to ensure stability of the diffusion network 
regardless of the combination weights. The properties revealed in this paper indicate that there needs 
to be some care with the use of consensus strategies for adaptation because they can lead to network 
failure even if the individual nodes are stable and well-behaved. The analysis also suggests that diffusion 
strategies provide a proper way to enforce cooperation over networks; their operation is such that diffusion 
networks will always remain stable irrespective of the combination topology. 



Consider a network consisting of N nodes distributed over a spatial domain. Two nodes are said to 
be neighbors if they can exchange information. The neighborhood of node k is denoted by A4- The 
nodes in the network would like to estimate an unknown M x 1 vector, w°. At every time instant, i, 
each node k is able to observe realizations {dk{i),Uk,i} of a scalar random process dk(i) and a 1 x M 
vector random process u k j with a positive-definite covariance matrix, R u ^ = Eu^ jttfe,j > 0, where 
E denotes the expectation operator. All vectors in our treatment are column vectors with the exception 
of the regression vector, u k ,i, which is taken to be a row vector for convenience of presentation. The 
random processes {d k {i), u ki i} are related to w° via the linear regression model ||35| : 



II. Estimation Strategies over Networks 



dfc(z) = u kA w° + v k (i) 



(1) 



where v k (i) is measurement noise with variance a, 
independent, i.e., 



1 k and assumed to be temporally white and spatially 



®v%(i)v l (j)=o* jk 




(2) 
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in terms of the Kronecker delta function. The regression data u^ i are likewise assumed to be temporally 
white and spatially independent. The noise v k (i) and the regressors {«zj} are assumed to be independent 
of each other for all {k,l,i,j}. All random processes are assumed to be zero mean. Note that we use 
boldface letters to denote random quantities and normal letters to denote their realizations or deterministic 
quantities. Models of the form CD) are useful in capturing many situations of interest, such as estimating 
the parameters of some underlying physical phenomenon, tracking a moving target by a collection of 
nodes, or estimating the location of a nutrient source or predator in biological networks (see, e.g., EJ, 
0, |[35l ); these models are also useful in the study of the performance limits of combinations of adaptive 
filters 1551-1591. 

The objective of the network is to estimate w° in a distributed manner through an online learning 
process. The nodes estimate w° by seeking to minimize the following global cost function: 

N 

jf° b («,)=E E i d *w- ,4 *.H 2 - (3) 
fc=i 

In the sequel, we describe the algorithms pertaining to the consensus and diffusion strategies that we 
study in this article, in addition to the non-cooperative mode of operation. Afterwards, we move on to the 
main theme of this work, which is to show why diffusion networks outperform consensus networks. We 
may remark that the same strategies can be used to optimize global cost functions where the individual 
costs are not necessarily quadratic in w as in ([5J). Most of the mean-square analysis performed here can 
be extended to this more general scenario — see, e.g., 1001 , ||40l and the references therein. 



A. Non-Cooperative Strategy 

In the non-cooperative mode of operation, each node k operates independently of the other nodes and 
estimates w° by means of a local LMS adaptive filter applied to its data {dk(i), u k ,i}- The filter update 
takes the following form EH, PT1 : 



(non-cooperative strategy) 



Wk,i = Wk,i-i + HkU* kA [d k (i) - u k:i w kti _i\ 



(4) 



where fi k > is the constant step-size used by node k. In (0]), the vector w k ,i denotes the estimate for 
w° that is computed by node k at time i. Note that for the underlying model where R uk > for all k, 
every individual node can employ © to estimate w° independently if desired. Studies allowing for other 
observability conditions for diffusion and consensus strategies, including possibly singular covariance 
matrices, appear in |[T8l . B2l . 
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B. Cooperative Strategies 

In the cooperative mode of operation, nodes interact with their neighbors by sharing information. In 
this article, we study three cooperative strategies for distributed estimation. 

B.l. Consensus Strategy: The consensus strategy often appears in the literature in the following form 
(see, e.g., Eq. (1.20) in flTU, Eq. (19) in HH, and Eq. (9) in lfT8l ): 

Wk,i = w k ,i-i - Hk{i) ■ ^ k,k(wk,i-i - wi,i-i) + /Ujfc(0 • u* k)i [d k (i) - u k ,iW k ,i-i\ ^ 
leM k \{k} 

where is a set of nonnegative coefficients. It should be noted that in most works on consensus 

implementations, especially in the context of distributed optimization problems |[T6l - |[T8l , E3l , ESI , the 
step-sizes {fJ- k (i)} that are used in © depend on the time-index i and are required to satisfy 



H k (i) = oo and < oo. 



(6) 



i=0 



In other words, for each node k, the step-size sequence [i k (i) is required to vanish as i — > oo. Under 
such conditions, it is known that consensus strategies allow the nodes to reach agreement about w° |[T6l , 
ifTHTl . ll43l . ll44l . Here, instead, we will use constant step-sizes {fi k }. This is because we are interested in 
studying the adaptation and learning abilities of the networks. Constant step-sizes are critical to endow 
networks with continuous adaptation and tracking abilities; otherwise, under ©, once the step-sizes have 
decayed to zero, the network stops adapting and learning is turned off. 

We can rewrite recursion (|5) in a more compact and revealing form by combining the first two terms 
on the right-hand side of (|5]) and by introducing the following coefficients: 



1 - EjeA/- fc \{fc} 

Vkbl,k, 

0. 



Vkbj,k, if I = k 

if I G A/fe \ {k} 
otherwise 



(7) 



In this way, recursion (|5]) can be rewritten equivalently as (see, e.g., expression (7.1) in E3l and expression 
(1.20) in Ull): 



(consensus strategy) 



w k . 



^2 %ku>l,i-l + VkU* k)i [d k {i) - u kti w k>i -i] 



(8) 



The entry ai jk denotes the weight that node k assigns to the estimate u^i-i received from its neighbor 
I (see Fig. [Q; note that the weights {ai jk } are nonnegative for I ^ k and that a kjk is nonnegative for 
sufficiently small step-sizes. If we collect the nonnegative weights {a^ k } into an N x N matrix A, then 



Fig. 1. A connected network showing the neighborhood of node k, denoted by A4- The weight ai.t scales the data transmitted 
from node I to node k over the edge linking them. 



it follows from $7} that the combination matrix A satisfies the following properties: 



ai,k > 0, A 



1 = 1, and a^k = if I J\f k 



(9) 



where 1 is a vector of size N with all entries equal to one. That is, the weights on the links arriving 
at node k add up to one, which is equivalent to saying that the matrix A is left-stochastic. Moreover, if 
two nodes I and k are not linked, then their corresponding entry a\ k is zero. 

B.2. ATC Diffusion Strategy: Diffusion strategies for the optimization of © in a fully decentralized 
manner were derived in |f2], ||3], Il24l - ll26l . ll30l by applying a completion-of-squares argument, followed 
by a stochastic approximation step and an incremental approximation step — see E71 . The adapt-then- 
combine (ATC) form of the diffusion strategy is described by the following update equations Q: 



The above strategy consists of two steps. The first step of (fTOt involves local adaptation, where node k 
uses its own data {d k (i),u kj i} to update its weight estimate from w^.i-i to an intermediate value ipf,,i- 
The second step of (fTOl ) is a consultation (combination) step where the intermediate estimates {V^,i} from 
the neighborhood of node k are combined through weights that satisfy © to obtain the updated 

weight estimate Wk,i- 

B.3. CTA Diffusion Strategy: Another variant of the diffusion strategy is the combine-then-adapt (CTA) 
form, which is described by the following update equations : 



= Wk,i-i + Mfc*4,i[dfcW - u k,iW k ,i-i] 



(ATC diffusion strategy) 




(10) 




(CTA diffusion strategy) 



(ID 



w k ,i = ipk,i-i + VkU* kti [d k {i) - u^iipf.^] 
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Thus, comparing the ATC and CTA strategies, we note that the order of the consultation and adaptation 
steps are simply reversed. The first step of (|TT1 involves a consultation step, where the existing estimates 
{wij-x} from the neighbors of node k are combined through the weights {ai jk }. The second step of 
(fTTT > is a local adaptation step, where node k uses its own data {d k (i), u k: i} to update its weight estimate 
from the intermediate value ifiki-i to w k %• 

B.4. Comparing Diffusion and Consensus Strategies: For ease of comparison, we rewrite below the 
recursions that correspond to the consensus (H), ATC diffusion (TTOb . and CTA diffusion (TTTb strategies 
in a single update: 

(consensus) w k>i = ^ a^wi^ + ^ k u* Ki [d k (i) - u kji w kii -x] (12) 
(ATC diffusion) w k>i = ^ a tjk + mu*^di{i) - it^u^-i]) (13) 



(CTA diffusion) w kji = ^ aj,fc^,i-i + HkU* k , 

ieAf k 



d k {i) -u kii I ^ a i,kWi,i-\ 



(14) 



Note that the first terms on the right hand side of these recursions are all the same. For the second terms, 
only variable w k .i~i appears in the consensus strategy (fl2l . while the diffusion strategies ([T3Tl- (fT4l 
incorporate the estimates {w/^-i} from the neighborhood of node k into the update of w k ^. Moreover, 
in contrast to the consensus (fT2l and CTA diffusion (fT4l strategies, the ATC diffusion strategy ([T3l 
further incorporates the influence of the data {di(i),ui t i} from the neighborhood of node k into the 
update of w k ,i- These differences in the order by which the computations are performed have important 
implications on the evolution of the weight-error vectors across consensus and diffusion networks. It is 
important to note that the diffusion strategies (TT3Tl- (fT4l are able to incorporate additional information into 
their processing steps without being more complex than the consensus strategy. All three strategies have 
the same computational complexity and require sharing the same amount of data (see Table I), as can 
be ascertained by comparing the actual implementations ©, (fTOl ), and (fTTb . The key fact to note is that 
the diffusion implementations first generate an intermediate state variable, which is subsequently used in 
the final update. This important ordering of the calculations has a critical influence on the performance 
of the algorithms, as we now move on to reveal. 



III. Mean-Square Performance Analysis 

The mean-square performance of diffusion networks has been studied in detail in (H, Q, l27l by 
applying energy conservation arguments 1051 , B51 . Following Q, we will first show how to carry out 



8 



TABLE I 

Comparison of the number of complex multiplications and additions per iteration, as well as the number of M x 1 vectors that 
are exchanged for each iteration of the algorithms at every node k. In the table, the symbol n k denotes the degree of node k, 
i.e., the size of its neighborhood A/jt. Observe that all three strategies have exactly the same computational complexity. 





ATC diffusion (To) 


CTA diffusion (n) 


Consensus {8]i 


Multiplications 


(n k + 2)M 


(n k + 2)M 


(n k + 2)M 


Additions 


(n k + 1)M 


(n k + 1)M 


(rife + 1)M 


Vector exchanges 


rife 


nfe 


rife 



the performance analysis in a unified manner that covers both diffusion and consensus strategies (see 
Table II further ahead, which highlights how the parameters for both strategies differ). Subsequently, we 
use the resulting performance expressions to carry out detailed comparisons and to establish and highlight 
some surprising and interesting differences in performance. 

A. Network Error Recursion 

Let the error vector for an arbitrary node k be denoted by 

Wk,i - w° - Wk,i- (15) 

We collect all error vectors and step-sizes across the network into a block vector and block matrix: 

Wi = col {w lti , w 2 ,i, ■■■ , w Nii } (16) 

M = diag{/ii/ A ./, M2^ju, ■ • ■ , fJ-Nhi} (17) 

where the notation col{-} denotes the vector that is obtained by stacking its arguments on top of each 
other, and the notation diag{-} constructs a diagonal matrix from its arguments. We further introduce the 
extended combination matrix: 

A = A®I M (18) 

where the symbol ® denotes the Kronecker product of two matrices. This construction replaces each 
entry ai % in A by the M x M diagonal matrix cli^Im in A. Then, if we start from (TT2T i. ([13} , or (Q3), 
and use model £T|), some straightforward algebra similar to 10, |[27l shows that the global error vector 
ibi for the various strategies evolves according to the following recursion: 

Wi = Bi ■ Wi-i - yi (19) 
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TABLE II 

The network weight error vector evolves according to the recursion ibi = Bi ■ u>j_i — y i , where the variables {Bi, Hi), and 
their respective means or covariances, are listed below for three cooperative strategies and the non-cooperative strategy. 





ATC diffusion (flOj 


CTA diffusion ED 


Consensus © 


Non-cooperative © 




A t (Inm -MKi) 


(Inm - MHi)A T 


A T - MKi 


Inm - MTZi 


B = EBi 


A t {Inm-M1Z) 


(Inm - MTl)A T 


A T -Mil 


Inm - Mil 


Vi 


A T Ms t 


Msi 


Msi 


Msi 


y = 


A T MSMA 


MSM 


MSM 


MSM 



where the quantities Bi and y i are listed in Table II and where TZi is a block diagonal matrix and is 
a block column vector: 

TZi = diag{u* jWi^, u* 2i u 2 ,i, ■■■ , u* N ^u NA } (20) 

Si = COl{u* ti Vl t i,V% ti V2,U ■■■ , U*N,i v N,i}- (21) 

The coefficient matrix Bi is an N x N block matrix with blocks of size M x M each. Likewise, the 
driving vector y t is an N x I block vector with entries that are M x 1 each. The matrix Bi controls 
the evolution of the network error vector Wi. It is obvious from Table II that this matrix is different for 
each of the strategies under consideration. We shall verify in the sequel that the differences have critical 
ramifications when we compare consensus and diffusion strategies. Note in passing that any of these 
three distributed strategies degenerates to the non-cooperative strategy ((U when A = In- 

B. Mean Stability 

We start our analysis by examining the stability in the mean of the networks, i.e., the stability of 
the recursion for Et«i. Thus, note that the matrices {Bi} in Table II are random matrices due to the 
randomness of the regressors in Hi- In other words, the evolution of the networks is stochastic in 

nature. Now, since the regressors {ufe,i} are temporally white and spatially independent, then the {Bi} 
are independent of m^-i for any of the strategies. Moreover, since the {uk,i, Vk(i)} are independent of 
each other, then the {y^} are zero mean. Taking expectation of both sides of ([T9T i, we find that the mean 
of Wi evolves in time according to the recursion: 



Ewi = B ■ Ewi_i 



(22) 
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where B = EBi is shown in Table II and 

K = WRi = diag{R uA ,R u , 2 , ■■■ , R U , N }. (23) 

The necessary and sufficient condition to ensure mean stability of the network (namely, Kibi — > as 
i — > oo) is therefore to select step-sizes {pk} that ensure Q: 



p(B) < 1 (24) 



where p(-) denotes the spectral radius of its matrix argument. Note that the coefficient matrices {£>} 
that control the evolution of EiWj are different in the cases listed in Table II. These differences lead to 
interesting conclusions. 

B.l. Comparison of Mean Stability: To begin with, the matrix B is block diagonal in the non-cooperative 
case and equal to 

£ncop = Inm - MK. (25) 

Therefore, for each of the individual nodes to be stable in the mean, it is necessary and sufficient that 
the step-sizes {pk} be selected to satisfy 

p(B n cop) = max p(I M - PkR u ,k) < 1 ( 26 ) 
since the matrices M. from (fTTT i and 1Z from (l23l are block diagonal. Condition (|26T i is equivalent to 
(stability in the non-cooperative case) 



< p k < 



(Ru,k) 



for k = 1,2,... ,N (27) 



where A max (-) denotes the maximum eigenvalue of its Hermitian matrix argument. Condition (1271) 
guarantees that when each node acts individually and applies the LMS recursion ((U, then the mean 
of its weight error vector will tend asymptotically to zero. That is, by selecting the step-sizes to satisfy 
d27l ), all individual nodes will be stable in the mean. 

Now consider the matrix B in the consensus case; it is equal to 

£cons = A T - MR. (28) 

It is seen in this case that the stability of £> cons depends on A. The fact that the stability of the consensus 
strategy is sensitive to the choice of the combination matrix is known in the consensus literature for 
the conventional implementation for computing averages and which does not involve streaming data or 
gradient noise ||6], ll46l . Here, we are studying the more demanding case of the single time-scale consensus 
iteration © in the presence of both noisy and streaming data. It is clear from (|28l l that the choice of 
A can destroy the stability of the consensus network even when the step-sizes are chosen according to 
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(|2VT > and all nodes are stable on their own. This behavior does not occur for diffusion networks where 
the matrices {B} for the ATC and CTA diffusion strategies are instead given by 

-Bate = A T {I NM - Mil) and B cta = (I NM - MK)A T . (29) 

The following result clarifies these statements. 

Theorem 1 (Spectral properties of B). It holds that 

p(B a tc) = P(£cta) < P(fincop) (30) 

irrespective of the choice of the left-stochastic matrices A. Moreover, if the combination matrix A is 
symmetric, then the eigenvalues of B cons are less than or equal to the corresponding eigenvalues of 

Bncop> i.e., 

Aj (Bcons) < Aj(B ncop ) for 1 = 1,2,..., NM (31) 
where the eigenvalues {Xi(-)} are arranged in decreasing order, i.e., A^(-) > A; 2 (-) if l\ < l^. 

Proof: See Appendix lAl ■ 
Result (l30l establishes the important conclusion that the coefficient matrix B for the diffusion strategies 
is stable whenever B ncop (or, from (l2oT i. each of the matrices {Im — PkRu,k}) is stable; this conclusion 
is independent of A. The stability of the matrices {Im — HkRu,k} is ensured by any step-size satisfying 
d2"71 ). Therefore, stability of the individual nodes will always guarantee the stability of B in the ATC and 
CTA diffusion cases, regardless of the choice of A. This is not the case for the consensus strategy ([8]); 
even when the step-sizes {nk} are selected to satisfy d2Tb so that all individual nodes are mean stable, the 
matrix B cons can still be unstable depending on the choice of A (and, therefore, on the network topology 
as well). Therefore, if we start from a collection of nodes that are behaving in a stable manner on their 
own, and if we connect them through a topology and then apply consensus to solve the same estimation 
problem through cooperation, then the network may end up being unstable and the estimation task can 
fail drastically (see Fig. [2] further ahead). Moreover, it is further shown in Appendix A that when A is 
symmetric, the consensus strategy is mean-stable for step- sizes satisfying: 

< Mfc < \ + A 7i n( ^ forfc = l,2,...,iV. (32) 

Note from © that since A is a left-stochastic matrix, its spectral radius is equal to one and one of its 
eigenvalues is also equal to one B71 . i.e., Ai(^4) = p(A) = 1. This implies that the upper bound in (l32l 
is less than the upper bound in (|2VT > so that diffusion networks are stable over a wider range of step-sizes. 
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Actually, the upper bound in (l32l can be much smaller than the one in (|27T ) or even zero because A m i n (^4) 
can be negative or equal to —1. 

What if some of the nodes are unstable in the mean to begin with? How would the behavior of the 
diffusion and consensus strategies differ? Assume that there is at least one individual unstable node, i.e., 
Xi(B ncop ) < — 1 for some I so that p(B ncop ) > 1. Then, we observe from (|30l l that the spectral radius of 
B atc can still be smaller than one even if p(B ncop ) > 1. It follows that even if some individual node is 
unstable, the diffusion strategies can still be stable if we properly choose A. In other words, diffusion 
cooperation has a stabilizing effect on the network. In contrast, if there is at least one individual unstable 
node and the combination matrix A is symmetric, then from (OTT i. no matter how we choose A, the 
p(B conii ) will be larger than or equal to one and the consensus network will be unstable. 

The above results suggest that fusing results from neighborhoods according to the consensus strategy 
© is not necessarily the best thing to do because it can lead to instability and catastrophic failure. On the 
other hand, fusing the results from neighbors via diffusion ensures stability regardless of the topology. 

B.2. Example: Two-Node Networks: To illustrate these important observations, let us consider an ex- 
ample consisting of two cooperating nodes; in this case, it is possible to carry out the calculations 
analytically in order to highlight the various patterns of behavior. Later, in the simulations section, we 
illustrate the behavior for networks with multiple nodes. Thus, consider a network consisting of N = 2 
nodes. For simplicity, we assume the weight vector w° is a scalar, and R Uj ± = g\ 1 and R U) 2 = o\ 2- 
Without loss of generality, we assume \i\o\ \ < P2^2- The combination matrix for this example is of 
the form (Fig. |2]): 

1 — a a 
b 1 - 

with a,b G [0, 1]. When desired, a symmetric A can be selected by simply setting a = b. Then, using 
(f33i we get 

(i-Mi^i)^ 



A 1 



(33) 



(l-A*2< 2 )(l-6) 



(34) 



B c 



I -a- ma u>1 

b 1 



(35) 



We first assume that 



(36) 
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so that both individual nodes are stable in the mean by virtue of (I27I ). Then, by Theorem 1, the ATC 
diffusion network will also be stable in the mean for any choice of the parameters {o, b}. We now verify 
that there are choices for {a, b} that will turn the consensus network unstable. Specifically, we verify 
below that if a and b happen to satisfy 

a + b > 2 - mal A (37) 

then consensus will lead to unstable network behavior even though both individual nodes are stable. 
Indeed, note first that the minimum eigenvalue of B com is given by: 

(2 - a - b — \i\o\ x — H2&u 2) ~~ 

^min(^cons) = ^ (38) 

where 

D = (-a + b- n\a\ x + ^l^f + ^ ab 

(39) 

= (a + b + \i\a 2 uX - li2<?l a f + 4 %2^, 2 - Mioji)- 
From the first equality of (|39l , we know that D > and, hence, A m i n (S con s) is real. When (l36Tl- (l3"7T ) are 
satisfied, we have that (a + b + [i\o\ x — hig\ 2) and 46(/i2<7^ 2 — [i\o\ 1 ) in the second equality of (l39l) 
are nonnegative. It follows that the consensus network is unstable since 

Amin (Scons) < ' '-z ; — < -1. (40) 

In Fig. |2£a), we set \jl\o\ x = 0.4 and \i2o\ 2 = 0-6 so that each individual node is stable. If we now set 
a = b = 0.85, then d3"7T ) is satisfied and the consensus strategy becomes unstable. 
Next, we consider an example satisfying 

< < 2 < 2 (41) 



so that node 1 is still stable, whereas node 2 becomes unstable. From the first equality of (1391 , we again 
conclude that 

(2 - a - b - n x a\ \ x - Afcerj! 2 ) - \-a + b- mo* , + | 



Amin(^cons) 



2 

l-a-nxal A , if a + mo^! > b + M2^ j2 
1 — b — M20"u,2> otherwise 
< -1. 

That is, in this second case, no matter how we choose the parameters {a, b}, the consensus network is 
always unstable. In contrast, the diffusion network is able to stabilize the network. To see this, we set 



(42) 



14 




(a): jnal x = 0.4, ^<rj s = 0.6, a = b = 0.85 
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Fig. 2. Transient network MSD over time with N = 2. (a) (iitJu.i = 0.4, (i2&u,2 



0.6, and a 



0.85. As seen in the 



right plot, the consensus strategy is unstable even when the individual nodes are stable, (b) [i\o\ x — 0.4, fi2C r u2 — 2.4, and 
a — 1 — b = 0.2 so that node 2 is unstable. As seen in the right plot, the diffusion strategies are able to stabilize the network 
even when the non-cooperative and consensus strategies are unstable. 



b = 1 — a so that the eigenvalues of £> atc in (l34l are {0, 1 — 1 — {fJ>2&u 2 ~ ^i a u i) a l- Some algebra 
shows that the diffusion network is stable if a satisfies 

< a < ^— — (43) 

In Fig. Efb), we set \x\o\ 1 = 0.4 and 1 = 2.4 so that node 1 is stable, but node 2 is unstable. If 
we now set a = 1 — b = 0.2, then (l43l) is satisfied and the diffusion strategies become stable even when 
the non-cooperative and consensus strategies are unstable. 

C. Mean-Square Stability 

We now examine the stability in the mean-square sense of the consensus and diffusion strategies. Let 
£ denote an arbitrary nonnegative-definite matrix that we are free to choose. From dl"9V we get the 
following weighted variance relation for sufficiently small step-sizes: 



E||t&i||i wE||t& i _i|||. EB +Tr(Ey) 



(44) 
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where the notation ||a;||| denotes the weighted square quantity x*T,x and y = Ky^* appears in Table 
II with the covariance matrix S defined by: 

S = EsjS* = diagjcr^i^,!, o% j2 Ru,2, ■ ■ ■ , ^n^n}- ( 45 ) 
As shown in Q, E71 . 11481 . step-sizes that satisfy (|24l and are sufficiently small will also ensure mean- 



square stability of the network (namely, E||u? 



c < oo as i — >• oo). Therefore, we find again that, for 



infmitesimally small step-sizes, the mean-square stability of consensus networks is sensitive to the choice 
of A, whereas the mean-square stability of diffusion networks is not affected by A. In the next section, we 
will examine p(B) more closely for the various strategies listed in Table II and establish that diffusion 
networks are not only more stable than consensus networks but also lead to better mean-square-error 
performance as well. 



D. Mean-Square Deviation 

The mean-square deviation (MSD) measure is used to assess how well the nodes in the network 
estimate the weight vector, w°. The MSD at node k is defined as follows: 

2 



MSD fc = lim E lu^i 

i— >oo 



(46) 



where || • || denotes the Euclidean norm for vectors. The network MSD is defined as the average MSD 
across the network, i.e., 

1 - 

MSD^-J]MSD fc . (47) 



k=l 



Iterating (1441 ). we can obtain a series expression for the network MSD as: 




(48) 



We can also obtain a series expansion for the MSD at each individual node k as follows: 



MSD fc = J]Tr [(el I M ) ■ B J yB* j ■ (e k ® I M )] 

3=0 



(49) 



where denotes the kth column of the identity matrix /jy. Expressions (|49l-(l48l relate the MSDs directly 
to the quantities {B, y} from Table II. 
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TABLE III 

Variables for cooperative and non-cooperative implementations when fi k = n and R Uyk = R, 





ATC diffusion (TO) 


CTA diffusion 


Consensus ([8} 


Non-cooperative (f4]> 


B 


A T ® I M - A T ® fiRa 


A T ® 7 M - A T <8> 




/at ® - In ® LiRu 


Ai, m (B) 


Ai(A)(l-/xA m (i?»)) 


Aj(A)(l-AiA m (ii„)) 


Xi{A) - n\ m (Ru) 


l - n\ m (Ru) 


y 


/x 2 (A T E„A) ® -R„ 




/x 2 E„ ®7?„ 


^ 2 E„ ® i?„ 


S \*my s \,m 


^ 2 X m (Ru)\\i(A)\ 2 -sf^st 


/J? \m(Ru) ' S*E„S; 


fi? X m (R u ) ■ s*E„Si 


l^ 2 X m (R u ) • s*E„s; 



IV. Comparison of Mean-Square Performance for Homogeneous Agents 

In the previous section, we compared the stability of the various estimation strategies in the mean and 
mean-square senses. In particular, we established that stability of the individual nodes ensures stability 
of diffusion networks irrespective of the combination topology. In the sequel, we shall assume that the 
step-sizes are sufficiently small so that conditions (|27T ) and 021 ) hold and the diffusion and consensus 
networks are stable in the mean and mean-square sense; as well as the individual nodes. Under these 
conditions, the networks achieve steady-state operation. We now use the MSD expressions derived above 
to establish that ATC diffusion achieves lower (and, hence, better) MSD values than the consensus, 
CTA, and non-cooperative strategies. In this way, diffusion strategies do not only ensure stability of the 
cooperative behavior but they also lead to improved mean-square-error performance. We establish these 
results under the following reasonable condition. 

Assumption 1. All nodes in the network use the same step-size, /ifc = fM, and they observe data arising 
from the same covariance data so that R u ,k — Rufor all k. In other words, we are dealing with a network 
of homogeneous nodes interacting with each other. In this way, it is possible to quantify the differences 
in performance without biasing the results by differences in the adaptation mechanism (step-sizes) or in 
the covariance matrices of the regression data at the nodes. 

Under Assumption 1, it holds that M. = /iljvM and TZ = Jjv ® Ru, and thus the matrices B and 3^ in 
Table II reduce to the expressions shown in Table III, where we introduced the diagonal matrix 

4 diag{o* i, < 2 , • • ■ , o* >N } > 0. (50) 

Note that the ATC and CTA diffusion strategies now have the same coefficient matrix B. We explain in 
the sequel the terms that appear in the last row of Table III. 
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A. Spectral Properties of B 

As mentioned before, the stability and mean-square-error performance of the various algorithms depend 
on the corresponding matrix £>; therefore, in this section, we examine more closely the eigen-structure 
of B. For the disttibuted strategies (diffusion and consensus), the eigen-structure of B will depend on 
the combination matrix A. Thus, let r\ and s\ (I = 1,2, ... , N) denote an arbitrary pair of right and left 
eigenvectors of A corresponding to the eigenvalue \i(A). That is, 

A T n = \i{A)n and s* L A T = Xi(A)sl (51) 

We scale the vectors r\ and s/ to satisfy: 

|| 77 1| = 1 and s*[ri = 1 for all I. (52) 

Recall that Xi(A) = p(A) = 1. Furthermore, we let z m (m = 1, 2, . . . , M) denote the eigenvector of the 
covariance matrix R u that is associated with the eigenvalue X m (R u ). That is, 

RuZrn — X m (^R u ^Z m . (53) 

Since R u is Hermitian and positive-definite, the {z m } are orthonormal, i.e., z m z mi = 8 mim-2 , and the 
{X m (R u )} are positive. The following result describes the eigen-structure of the matrix B in terms of 
the eigen-structures of {A T ,R U } for the diffusion and consensus algorithms of Table III. Note that the 
results for any of these distributed strategies collapse to the result for the non-cooperative strategy when 
we set Xi(A) = 1 for all I. 

Lemma 1 (Eigen-structure of B under diffusion and consensus). The matrices {£>} appearing in Table 
III for the diffusion and consensus strategies have right and left eigenvectors {f\ m ,s b lm \ given by: 

r l,m = r l® z m and s h l m = si®z m (54) 

with the corresponding eigenvalues, X^ m {B), shown in Table 111 for I = 1, 2, . . . , N and m = 1,2, . . . , M. 
Note that while the eigenvectors are the same for the diffusion and consensuses strategies, the corre- 
sponding eigenvalues are different. 

Proof: We only consider the diffusion case and denote its coefficient matrix by £>dift = A T (g> 1m — 
A T ® nR u ; the same argument applies to the consensus strategy. We multiply £>diff by the r\ m defined 



18 



in (l54l ) from the right and obtain 

Bus ■ rl m = (A T ® I M -A T ® pR u ) ■ (n ® z m ) 

= A; (A) • (rj <g> z m ) - Aj(A) • p\ m {R u ) ■ (n ® «m) ( 55 ) 

= AKA)(l-/iA m (K))-rf )W 
where we used the Kronecker product property (A (g> B)(C ® D) = AC ® BD for matrices { A, i?, C, D} 
of compatible dimensions ll35l . In a similar manner, we can verify that Bdm has left eigenvector s\ m 
defined in (l54l ) with the corresponding eigenvalue \i m {B) from Table III. ■ 

Theorem 2 (Spectral radius of B under diffusion and consensus). Under Assumption 1, it holds that 

cons ) (56) 

where equality holds if A = In or when the step-size satisfies: 

< A* < min x Z" 1 ^' p r . (57) 

Proof: See Appendix iBl ■ 
Note that the upper bound in (1571 ) is even smaller than the one in (l32l ) and, therefore, can again be 
very small or even zero. It follows that there is generally a wide range of step-sizes over which p(B cons ) 
is greater than p(B&g). When this happens, the convergence rate of diffusion networks is superior to 
the convergence rate of consensus networks; in particular, the quantities Kwi and E||-ibj|| 2 will converge 
faster towards their steady-state values over diffusion networks than over consensus networks. 

B. Network MSD Performance 

We now compare the MSD performance. Note that the expressions for the individual MSD in d49l ) and 
the network MSD in (|48T ) depend on B in a nontrivial manner. To simplify these MSD expressions, we 
introduce the following assumption on the combination matrix. 

Assumption 2. The combination matrix A is diagonalizable, i.e., there exists an invertible matrix U and 
a diagonal matrix A such that 

A T = UAU' 1 (58) 

with 

U 



n r 2 ••• r N 



U- L =col{s* 1 ,s* 2 ,...,s* N } (59) 



A = diag{A!(A), A 2 (A), . . . , X N (A)}. (60) 
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That is, the columns of U consist of the right eigenvectors of A T and the rows ofU 1 consist of the left 
eigenvectors of A T , as defined by 071) . 



Note that, besides condition d52l) , it follows from Assumption 2 that s* = 5ij 2 . Furthermore, any 
symmetric combination matrix A is diagonalizable and therefore satisfies condition (l58l l automatically. 
Actually, when A is symmetric, more can be said about its eigenvectors. In that case, the matrix U will 
be orthogonal so that U" 1 = U T and it will further hold that r* = 5ij 2 . Assumption 2 allows the 
analysis to apply to important cases in which A is not necessarily symmetric but is still diagonalizable 
(such as when A is constructed according to the uniform rule by assigning to the links of node k weights 
that are equal to the inverse of its degree, n^). We can now simplify the MSD expressions by using the 
eigen-decomposition of B from Lemma 1 and the above eigen-decomposition of A. 



Lemma 2 (MSD expressions). The MSD at node k from < \49\) can be expressed as: 

N N M , T \ . b* y b . ( r * p ,\ 

MSD fe = 2^ 2^ l-A, (B)\* (B) ' (61) 

l 1= l! 2 =lm=l 1 ^M )^,^) 
Furthermore, if the right eigenvectors {77} of A T are approximately orthonormal, i.e., 

rtni ~ ^1/2 (62) 



then the network MSD from ( |4<SD can be approximated by: 



N M b* v b 

Proof: See Appendix ICl ■ 
Note that any symmetric combination matrix A satisfies condition (l62l since, as mentioned above, its 
right eigenvectors can be chosen to be orthonormal. 

Using the expressions for X^ m (B) and s\* m ys\ m from Table III and substituting into (I63T ). we can 
obtain the network MSD expressions for the various strategies. The following result shows how these 
MSD values compare to each other. 



Theorem 3 (Comparing network MSDs). If condition rf62l is satisfied, then the ATC diffusion strategy 
achieves the lowest network MSD in comparison to the other strategies ( CTA diffusion, consensus, and 
non-cooperative). More specifically, it holds that 

MSD atc < MSD cta < MSD ncop (64) 
MSD atc < MSD cons . (65) 
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I: MSD atc < MSD cta < MSD ncop < MSD co 
II: MSD atc < MSD cta < MSD cons < MSD„, 
III: MSD atc < MSD cons < MSD cta < MSD ; 



Fig. 3. Network MSD comparison with N = 2 and = 0.4. The consensus strategy is unstable when the parameters a 
and b lie above the dashed line in region I. 



Furthermore, if 1 < /tiA m i n (-R u ) < 2, the consensus strategy is the worst even in comparison to the 
non-cooperative strategy: 

MSD ate < MSDcta < MSDncop < MSDcons- (66) 

Proof: See Appendix 151 ■ 
Therefore, the ATC diffusion strategy outperforms consensus, CTA diffusion, and non-cooperative 
strategies when condition (l62l is satisfied. However, the relation among MSD cta , MSD C0I1S , and MSD ncop 
depends on the combination matrix A. To illustrate this fact, we reconsider the two-node network from 
Section III.B with a\ 1 = a\ 2 = <r^, \i\ = \ii = \i, and < [ia\ < 1. Furthermore, to ensure the 
stability of the consensus strategy and from <137T >. the parameters {a, b} in (l33T > are now chosen to satisfy 
a + b < 2 — [io\. In this case, the eigenvalues of the combination matrix A in (|33l ) are {1, 1 — a — b}. 
It can be verified from (|63l l and Table III that the CTA diffusion strategy achieves lower network MSD 
(better mean-square performance) than the consensus strategy if 



MSD cons < MSD cta , if < a + b < ^ ffl 

(67) 

MSD cons >MSD cta , if 2 2~PP <a + b<2- [lal 
Similarly, the network MSDs of the consensus and non-cooperative strategies have the following relation: 

MSD cons < MSD nc0 p, if < a + b < 2(1 - fial) 

(68) 

MSDcons > MSDncop, if 2(1 - fial) < a + b < 2 - p,a 2 u 
Combining (I67l)-(l68l). we can divide the a x b plane into three regions, as shown in Fig. [3] where each 
region corresponds to one possible relation among MSD cta , MSD cons , and MSD nC o P . 
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C. MSD of Individual Nodes 

In Theorem [3j we established that the ATC diffusion strategy performs the best in terms of the average 
network MSD. It is still not clear how well the individual nodes perform under each strategy. It is 
generally more challenging to compare diffusion and consensus strategies in terms of the MSDs of their 
individual nodes due to the structure of the matrix B for the consensus strategy. Nevertheless, this can 
be accomplished as follows. We observe from (|6TT > and Table III that the {MSD^} for the CTA diffusion 
and consensus strategies differ only in the value of X^ m {B). From Table III, the difference between the 
values of A; m (£>) for these two strategies is 

A/, m CBcta) - A ijm (£ cons ) = fi\ m (Ru) ■ (1 - Xi(A)) = 0{p) (69) 
where the term 0{p) denotes a factor that is of the order of the step-size \x. It follows that for sufficiently 



small step-sizes, expression (1691 is close to zero and the CTA diffusion and consensus strategies will 
exhibit similar MSDs at the individual nodes, i.e., MSD cta ^ « MSD cons f. for all k. As a result, in the 
following, we only compare MSD atc MSD cta fc, and MSD ncop In particular, we will show that under 
certain conditions on the combination matrix A, the ATC diffusion strategy continues to perform the best 
in terms of the MSD at the individual nodes in comparison to the other strategies. To do so, starting 
from (l6"TT ) and the expressions for {Xi^(B),y} in Table III, we can express the MSD at node k for the 
ATC diffusion strategy as: 

MSD atc , fc = J> X m{ R u )Y: R = ^ MSD^H (70) 

m=l ii,Ja=l 2 m=l 

where we introduced the notation MSD atc fc(m) to denote the MSD component at node k that is contributed 
by the ?nth eigenvalue of R u , i.e., 

In a similar vein, we can define the corresponding MSDfc^n) terms for the other strategies. We list these 
terms in Table IV in two equivalent forms (we will use the series form later). We first have the following 
useful preliminary result. 

Lemma 3 (Useful comparisons). The following ratios are positive and independent of the node index k: 
MSD ncoPifc (m) - MSD atCjfc (m) 1 



MSD ncoPife (?n) - MSD ctaifc (m) (1 - ^X m {R u )) 2 
MSD ncoPifc (m) - MSD atCjfc (m) 1 



MSD cta , fc (m) - MSD atCifc (m) 1 - (1 - fiX m (R u )) 



> (72) 
> 0. (73) 
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TABLE IV 

Expressions for MSD fc (m) in series form and eigen-form. 



ATC Diffusion 

Go} 



Series form 



Eigen-form 



2\ (v> \ \^ N A 'i( A ) A r 2 (- 4 )-( e fc T "ii ;i r 1 s ^'2 r, r 2 e fc) 

fj, A m {ttu) 2 J J 1 ,{ 2 =1 l-A il (A)A* 9 (A)-(l-^A m (i?^)) 2 



CTA Diffusion 



Series form 



^ 2 A m ( J Rj £°1 (1 " A*A m (i4)) 2j • e T k A T ^ v A=e k 



Eigen-form 



/i Am^j^l^fed i_a,j (A)A,* (A)-(l-/iA m (R„)) 2 



Non- 
cooperative 





Series form 



Eigen-form 



^ ^ml-«iij Z^Zi / 2 =i TTT7T 



Proof: From the eigen-forms of {MSD k (m)} in Table IV, the differences between MSD atc ^(m) 
MSD ctaifc (m), and MSD nc0Pjfc (m) are given by: 

MSD nc0Pifc (m) - MSD atc , fc (m) = ( f Kn ^ u } u .. 2 • c k {m) 

1 - (1 - n\ m {R u )Y 

A /TOT~\ / \ Tv^QT-i /- ^ ^ 2 ^m{Ru) • (1 — H^m{Ru)) 2 I «. 

MSDncop fc (m) - MSD ctaifc (m) = c fc (m) 

1 - (1 - fiX m {R u )) 2 



-'ncop 

MSD ct a ifc (m) - MSD atC;fc (ro) = fi 2 X m (R u ) ■ c k { 



m) 



(74) 

(75) 
(76) 



where 



, , ^ [l-A tl (A)Ar a (A)]-( e ^r tl8 rS^ a r t * efc ) 
C "' lmj 1 - A K (A)A* (A) • (1 - n\ m {R u )Y 

at d72)-C73]> 



(77) 



Then, dividing (74]) by ([75]) and (74]) by (76]), we arrive at (f72l-d73b 



Lemma 4 (Useful ordering). 77ie relation among MSD atCj fc(m), MSD cta :k (m), and MSD ncop fc(m) is 
either 

MSD atc , fe (m) < MSD cta , fe (m) < MSD ncoPifc (m) (78) 



or 



MSD atCifc (m) > MSD cta , fc (m) > MSD ncop , fc (m). 



(79) 



Proof: Assume first that MSD atc .fc(m) < MSD ncoPi fc(m). Then, using (T72l ). we get MSD ncop .fc(m) — 
MSD ctaifc (m) > 0. Similarly, from (73), we get MSD ctaifc (m) - MSD atCjfc (m) > 0. We conclude that 
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relation d78T l holds in this case. Assume instead that MSD at c,fc(^) > MSD ncoPj fc(m). Then, a similar 
argument will show that d79l ) should hold. ■ 
The above result is useful since it allows us to deduce the relation among MSD ate) fe(m), MSD cta fc(m), 
and MSD ncop fc(m) by only knowing the relation between any two of them. To proceed, we note that 
we can alternatively express the MSDfc(m) terms in an equivalent series form. For example, expression 
(1711 ) can be written as: 

oo N 

MSD atC)fc (m) = fi 2 X m (R u ) (1- MRu)) 2j ■ A{ +1 (A) • X*^ +1) (A) • (e£r,X*W£e fc ) 

j=0l u l 2 =l 

oo / N \ / N \ 

= f\ m (R u ) £(1 - n\ m (Ru)) 2j - e qE ^(A^st X^ +1) (A) Sl2 rt 2 e k 

j=o \h=i / \z 2 =i J 

oo 

= n 2 \ m (Ru) J> ~ MRu)? ] ■ elA T ^ +1 ^ v Ai +1 e k . (80) 

3=0 

In a similar manner, we can obtain the corresponding MSDjt(m) series forms for the other strategies and 
we list these in Table IV. In the following, we provide conditions to guarantee that the individual node 
performance in the ATC diffusion strategy outperforms the other strategies. 

Theorem 4 (Comparing individual MSDs). If the combination matrix A satisfies 

E„- A T Z v A>0 (81) 
where T, v is the noise variance (diagonal) matrix defined by < \50]) . then: 

MSD atCifc < MSD cta , fc < MSD ncoP)fe . (82) 

Proof: From the series forms of {MSD k (m)} in Table IV, the difference MSD cta ^(m) — MSD atc ^(m) 
is given by: 

oo 

MSD cta , fc (m) - MSD atCjfc (m) = fi 2 X m (R u ) - ix\ m (R u )) 2 i e T k A T ? (E„ - A T H V A) A j e k . (83) 

3=0 

Since S t , — A T Ti v A > 0, we conclude that MSD cta fc(m) > MSD atC] fc(m) for all m. Then, applying 
Lemma |U we obtain relation (l82l . ■ 
Condition (I8TI ) essentially means that the combination matrix A should not magnify the noise effect 
across the network. However, in general, condition (I8T1 is restrictive in the sense that over the set of 
feasible diagonalizable left-stochastic matrices A satisfying = if I ^ M k , the set of combination 
matrices A satisfying ( f8T| ) can be small. We illustrate this situation by reconsidering the two-node network 
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(I33T ) for which 



-(1 - a)bt - a(l - b) 2b-b 2 (l + t) 
where t = a 2 i/cr 2 2 denotes the ratio of noise variances at nodes 1 and 2. Note from 

det(£„ - A T Y> V A) = -{a- bt) 2 < 





(84) 



(85) 



that equality holds in (|85T ) if, and only if, 



a = tb. 



(86) 



That is, when a ^ tb, the matrix (£„ — A T E„^4) has two eigenvalues with different signs. Thus, the only 
way to ensure £„ — A T Y, V A > in this case is to set a = tb and, thus, the matrix (£„ — A T Y, V A) will 
have at least one eigenvalue at zero since its determinant will be zero. To ensure Yi v — A T T, V A > 0, its 
other eigenvalue, which is equal to 6(1 + t 2 ){2 — b — bt), needs to be greater than or equal to zero. It 
follows that b must satisfy: 



Moreover, since a and b must lie within the interval [0, 1], we conclude from (|86l ) that b must also satisfy: 



It can be verified that condition (I88T ) implies condition (l87l ) since min{l, 1/t} < 2/(1 + 1). That is, for 
any left-stochastic matrix A from (l33T > satisfying a = tb and (l88l) . relation ([82l holds and both nodes 
improve their own MSDs by employing the diffusion strategies. Note that condition (l86l ) represents a 
line segment in the unit square a, b G [0, 1] (see Fig. |4|. In the following, we relax condition (IBTI ) with 
a mild constraint on the network topology. 

In addition to Assumption 2, we further assume that the combination matrix A is primitive (also called 
regular). This means that there exists an integer j such that the jth power of A has positive entries, 
[A?]^- > for all I and k |47l . We remark that for any connected network (where a path always exists 
between any two arbitrary nodes), if the combination weights satisfy > for / £ A4, then A 

is primitive. Now, since A is primitive, it follows from the Perron-Frobenius Theorem PT71 that (A T y 
converges to the rank-one matrix: 




(87) 



< b < min{l, 1/t}. 



(88) 



lim (A T ) j = nsj. 



(89) 



From © and (|52l ), r\ and si satisfy: 



n = 



1 



and 




= 1. 



(90) 



y/N 



y/N 
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Case 1: t = 2 > 1 Case 2: t = 1/2 < 1 




Fig. 4. Comparison of individual node MSD using N — 2 and t = erjj i/<7^ 2 - There exists a step-size region such that 
MSD atc ,fe < MSD cta ,fe < MSD ncop ,fc for k = 1,2 when the parameters a and 6 lie in the shaded regions. The dashed lines 
indicate condition d86b . 



Theorem 5 (Comparing individual MSDs for regular networks). For any primitive and diagonalizable 
combination matrix A, if 

jy < a ^k (91) 
/or a// fc, then there exists fx > so that for any step-size fj, satisfying < ji < it holds: 

MSD atCifc < MSD cta , fc < MSD ncoPifc . (92) 

Proof: See Appendix IE] ■ 
We show in Appendix [0 that for any primitive A, condition (|8TT ) implies condition (|9TI ). To illustrate 
these two conditions, we consider again the two-node network. It can be verified that sf for A T in (l33l) 
has the form = ^^/(a + ft) y/2a/{a + b) ■ Then, some algebra shows that condition (l9~TT) becomes 

(t - l)a + 2bt>0 and 2a + (1 - t)6 > 0. (93) 

Recall that t = 1 / 'a% 2 - We illustrate condition (l93l . along with condition (l86l ). in Fig. HI We observe 
that condition (f86l ). shown as the dashed lines, is contained in condition d93l , shown as the shaded 
regions, and that compared to condition (l86l l. condition (|93l enlarges the region of ^4 for which the ATC 
diffusion strategy performs the best in terms of the individual MSD performance. 

V. Simulation Results 

We consider a network with 20 nodes and random topology. The regression covariance matrix R u 
is diagonal with entries randomly generated from [2,4], and the noise variances k } are randomly 
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Fig. 5. Network topology and noise and data power profiles at the nodes. The number next to a node denotes the node index. 

TABLE V 

Combination rules used in the simulations, a tik = if I £ M k 



Name 


Rule 


Relative-variance J48j 


ai,k = a 


-2 / v-i -2 
v,l 1 2^j£Af k a v,j 


Uniform |27) 


ai,k = l/n fe 


Metropolis B9l 


ai,k = < 


1/ max{n fc , n ; }, if I G A4 \ {k} 



generated over [—30,-10] dB (see Fig. [5]). The network estimates a 10 x 1 (i.e., M = 10) unknown 
vector w° with every entry equal to 1/ \/To. 

The transient network MSD over time is shown on the left hand side of Fig. [6] with three possible 
combination rules: relative-variance ll48l . uniform ||27| . and Metropolis ll49l (see Table V). Note that the 
matrix A for the Metropolis rule is symmetric. The step-size fi is set to fi = 0.02. We observe that, as 
expected, the ATC diffusion strategy outperforms the other strategies, especially for the relative-variance 
rule. It also suggests that some conventional choices of combination weights, such as the Metropolis rule, 
may not be the most suitable for adaptation in the presence of both noisy and streaming data because 
such weights do not take into account the noise profile across the nodes (see, e.g., E71 . ll48l for more 
details on this issue). We further show the steady-state MSD at the individual nodes on the right hand 
side of Fig. [6] We observe that the ATC diffusion strategy achieves the lowest MSD at each node in 
comparison to the other strategies. These observations are in agreement with the results predicted by the 
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Fig. 6. Transient network MSD over time (left, with peak values normalized to OdB) and steady-state MSD at the individual 
nodes (right) for (a)-(b) the relative-variance, (c)-(d) uniform, and (e)-(f) Metropolis rules. The dashed lines on the left/right 
hand side indicate the theoretical network/individual MSD from (|63)/<rj[Ql for the ATC diffusion strategy. 



theoretical analysis. The theoretical expressions for MSDs from (|49l)-(|48l) are also depicted in Fig. [6] for 
the ATC diffusion strategy and match well with simulations. 

We further compare the mean-square performance of the distributed strategies for larger step-sizes. We 
set the step-size to fi = 0.075 and use the relative-variance combination rule. The transient network MSD 
over time is shown on the left hand side of Fig. [7] We observe that the ATC and CTA diffusion strategies 
have the same convergence rate and converge faster than the consensus strategy. Moreover, the diffusion 
strategies achieve lower network MSD than the consensus strategy. We also show the steady-state MSD 
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Fig. 7. Transient network MSD over time (left) and steady-state MSD at the individual nodes (right) for the relative-variance 
combination rule using fi — 0.075. 



at the individual nodes on the right hand side of Fig. [7] We see again that ATC diffusion performs the 
best in comparison to the other strategies at each individual node. 

VI. Concluding Remarks 

We compared analytically several cooperative estimation strategies, including ATC diffusion, CTA 
diffusion, and consensus for distributed estimation over networks. The results show that diffusion networks 
are more stable than consensus networks. Moreover, the stability of diffusion networks is independent of 
the combination weights, whereas consensus networks can become unstable even if all individual nodes 
are stable. Furthermore, in steady-state, the ATC diffusion strategy performs the best not only in terms 
of the network MSD, but also in terms of the MSDs at the individual nodes. 

Appendix A 
Proof of Theorem Q] 

First, note that the matrices {£>} for the ATC and CTA diffusion strategies given by (l29l have the same 
eigenvalues (and, therefore, the same spectral radius) because for any matrices X and Y of compatible 
dimensions, the matrix products XY and YX have the same eigenvalues ||47l . So let us evaluate the 
spectral radius of B atc . To do so, we introduce a convenient block matrix norm, and denote it by || • ||&; 
it is defined as follows. Let X be an N x N block matrix with blocks of size M x M each. Its block 
matrix norm is defined as: 

TV \ 

£H**.ill2 (94) 
i=i ) 

where X^j denotes the (k, l)th block of X and || • ||2 denotes the 2-induced norm (largest singular value) 
of its matrix argument. Now, since {Inm , A4,1Z} are block diagonal matrices, the following property 



\X\\b = max 

Kk<N 



29 



holds: 



\\Inm - MTZ\\ b = max \\I M - M/A.fclh = ^^ n P^ Im ~ PkR u ,k) = p(# ncop ) (95) 



where we used the fact that the 2-induced norm of any Hermitian matrix coincides with its spectral 
radius. In addition, since A is a left-stochastic matrix, it holds that 

||^l T || fe = max [y~] ||a7jfelM||2 = max VVfc 1=1. (96) 
l<k<N I l<k<N \ J 

Accordingly, using the fact that the spectral radius of a matrix is upper bounded by any norm of the 

matrix ||47l , we get: 

p(£atc) < \\A T (I NM - MK)\\ b < \\A T \\ b ■ \\I NM - MK\\ b = p(B DCOp ) (97) 
which establishes (f3Qb - 

Now, assume A is symmetric. Since it is also left-stochastic, it follows that its eigenvalues are real 
and lie inside the interval [—1,1]. Therefore, (Inm — A T ) is nonnegative-definite. Moreover, since M. 
and 1Z commute, i.e., 1ZM. = A41Z, it can be verified that B conti in d28l ) and B ncop in d25l ) are Hermitian. 
In addition, the matrices £> cons and B ncop are related as follows: 

£ncop = ^cons + {I N M ~ A T ) (98) 

with (Inm — A t ) > 0. Using Weyl's Theorenu |471 , we arrive at (PH . Following a similar argument, it 
holds for symmetric A that 

Xi {A min (A) • I NM - MK} < Xi(B cons ) for I = 1,2,..., NM. (99) 

Thus, the matrix £> cons is stable (namely, — 1 < Xi(B com ) < 1 for I = 1, 2, . . . , NM) if 

Xi (X min (A) ■ I NM - MK) > -1 (100) 

Ai(^ncop) < 1 (101) 

for I = 1, 2, . . . , NM, or, equivalently, 

A mi n04) " H k X m {Ru,k) > "I (102) 

1 - fi k X m (Ru,k) < 1 (103) 

'Let {D',D,AD} be M x M Hermitian matrices with ordered eigenvalues {\ m (D'), \ m (D), A m (AD)}, i.e., Ai(D) > 
X 2 (D) > . . . > Am(-D), and likewise for the eigenvalues of {D' , AD}. Weyl's Theorem states that if D' = D + AD, then 

\ m (D) + \m(AD) < X m (D') < X m (D) + Ai(AD) 

for 1 < m < M. When AD > 0, it holds that A m (D') > A m (D). 
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for k = 1, 2, . . . , TV and m = 1, 2, . . . , M. We then arrive at (1321 

Appendix B 
Proof of Theorem [2] 

For the diffusion strategies, from Table III and since p(A) = 1, we have 

p(^diff) = p[A T ® (J M - /ii^)] = • p(/ A / - = /o(/m - pRu) = p{B acop ). (104) 

Moreover, since 1 G {A; (A)}, we have 

p(Sncop) = max |1 - p\ m (R u )\ < max max \Xi(A) - pX m (R u )\ = p(B cons ) (105) 

l<m<M l<K7Vl<m<Af 

and we arrive at (l56l . It is obvious that when A = In, then equality in (11051) holds and /o(£> ncop ) = 
p(B com ). We now consider the case when A ^ ijy. Note that the spectral radius of £> ncop is given by 

p(^ncop) = max{l - pX min (R u ), -1 + /^A max (^ u )}. (106) 

We first verify that equality in (I105t holds only when p(B ncop ) = 1 — pX m - m (R u ). Indeed, if p(B ncop ) = 
— I + pX maiX (R u ) > 0, we have that pX max (R u ) > 1 and we get from (11051 ) that 

p(^cons) = max max \Xi(A) - pX m (R u )\ 

1<1<N l<m<M 

> \X t (A) - pX max (R u )\ 

> \ReiXtiA)} - pX max (R u )\ 

= -Re{A,(^)} + pX max (R u ) (107) 

since Re{Az(^4)} < 1 where Re{-} denotes the real part of its argument. Since A / In, there exists 
some I such that Re{A/(A)} < 1 and then yo(i3 cons ) > — 1 + pX maiX (R u ) = p(B ncop ). Now, assume that 
p(£>ncop) = 1 — ^A m i n (i? M ). Then, equality in (11051) holds if 

\Xi(A) - pX m (R u )\ < p(B ncop ) (108) 

for all I and m. It is obvious that relation (11081 ) holds for I = 1 since Xi(A) = 1 and 

p(S ncop ) = max \l-pX m (R u )\ 

l<m<M 

> |Ai(A)- M A m (i4)|- (109) 

For I = 2, 3, . . . , N, by the triangular inequality of norms, we have that | Xi(A) — pX m {R u )\ < |A/(^4)| + 
pX m3iX (R u ). Hence, the inequality in (1108b holds if 

\Xi(A)\ + pX max (R u ) < 1 - pX min (R u ) (110) 

for I = 2, 3, . . . , N and we arrive at (I57T ). 
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Appendix C 
Proof of Lemma [2] 

From Lemma [T] the eigen-decomposition for the matrix power IP is given by: 

N M 

6J = EE A i, m ( s )- r W- an) 

1=1 m=l 

Using (111 II) . we can rewrite the MSD at node k from d49l ) as: 

oo N M 
j=0 i 1 ,Z 2 =l mi,m 2 =l 

(112) 



where we used Tr(AB) = Tx{BA) and the expression for the infinite sum of a geometric series. Using 
(l54l . we have: 

r t*m 2 ( e fc ® / M)(ef ® iAf)rf 1)mi = (rt 2 e k elr h ) ® « 2 z m J = (r\e k e\r h ) ■ 5 mim2 (113) 

since the eigenvectors {z m } are orthonormal. Substituting (II 131 into (|1 12t . we arrive at (l6Tb . Likewise, 
from (l47T l and (1611 . the network MSD is given by 



1 " {Etirie k e T k r h ) ■ s^ m ysl m 
MSD = — > > - - ■ — . (114) 



From assumption (1621 . we can establish (1631 since 

N 

Yl r t e k e k r h = r* 2 ■ I N ■ r h « S hh . (1 15) 

k=l 

Appendix D 
Proof of Theorem [3] 

We first verify that MSD atc < MSD cta , MSD cta < MSD ncop , and MSD atc < MSD cons . We show the 
result by verifying that the individual terms on the right hand side of (l63l) for the various strategies have 
the same ordering. That is, from (l63l and Table III, we verify that the following ratios, which correspond 
to MSD atc < MSD cta , MSD cta < MSD ncop , and MSD atc < MSD cons , respectively, are upper bounded 
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by one: 

IM^4)| 2 <1 (116) 

1 - (1 - ^Xm(Ru)) 2 



l-|A/(A)p.(l-MA m (i4)) 2 
lAK^I^l-lAKA)-^^)! 2 ) 



< 1 (117) 
<1 (118) 



l-|A ; (A)| 2 -(l- M A m (i? u )) 2 
for all I and m. Note that relations (11 16l l- (|l 17t hold since ^(^4)1 < 1 for all I in view of the fact that A 
is left-stochastic and, hence, p(A) = 1. We therefore established (j64). On the other hand, relation (II 181 ) 
would hold if, and only if, 

|A^)| 2 [1 + (1 - fiX m (Ru)) 2 - \Xi(A) - fi\ m (R u )\ 2 ] < 1. (119) 

To establish that (11 191 ) is true for all / and m, we introduce the compact notation A = \i(A), 5 = 
pX m {R u ), and consider the following function of two variables: 

/(A, 6) = |A| 2 [1 + (1 - S) 2 - |A - 8\ 2 ] with |A| <l,Se (0, 2), and |A - S\ < 1. (120) 

The range for 5 ensures condition (1271 ) and the stability of the diffusion network, while the range for 
|A — 5\ ensures that the consensus network is stable, i.e., |A; im (£> cons )| < 1 for all I and m. Then, we 
would like to show that f(X,S) < 1. Since A is generally complex-valued, we denote the real part of A 
by X r . Then, the term |A - 5\ 2 in (TTIOb is given by |A - 5\ 2 = |A| 2 + S 2 - 2X r 5 and /(A, 5) from (TT2QT ) 
becomes 

f(X,5) = -\X\ A + 2(l-5 + X r 8)\X\ 2 . (121) 

Since f(X,5) is linear in 5, the maximum value of f(X,5) in (11211) over 5 occurs at the end points of 
5. Since 5 S (0, 2) and |A r — 5\ < \X — 5\ < 1, we conclude that < 5 < 1 + A r . Substituting the end 
points of S into (11211 ). we have 

/(A,0) = -(|A| 2 -1) 2 + 1<1 (122) 
/(A, 1 + A r ) = -|A| 4 + 2A 2 |A| 2 < |A| 4 < 1 (123) 

where we used the fact that A 2 < |A| 2 and |A| < 1. We therefore established (|65). 

Let us now examine what happens when the step-size is such that 1 < fJ.X mm (R u ) < 2. Again, from 
d63l ) and Table III, we establish that MSD ncop < MSD cons this conclusion by showing that the ratio of 
the individual terms appearing in the sums (1631 is upper bounded by one: 

1 - MA) - f,X m (R u y 2 



1 - (1 - fiX m {R u )f 



< 1 (124) 
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for all I and m. Condition (1124b is equivalent to showing that 

\\i(A) - ii\ m {R u )\ 2 - (1 - ^m(Ru)) 2 = |A| 2 - 2X r 5 - (1 - 25) > 



(125) 



where we used the notation from (11201 . Relation (1 1 25b holds since 5 > /xA m in(-Ru) > 1 > |A| > |A r | 
and then 

|A| 2 - 2A r <5 - (1 - 25) > A 2 - 2X r 5 - (1 - 25) = (1 - A r )(2<5 - 1 - A r ) > 0. (126) 



Appendix E 
Proof of Theorem [5] 

From the series forms of {MSDfc(m)} in Table IV, the difference between MSD cta fc(m) and MSD nc0P) fc(m) 
can be expressed as: 

oo 

MSD ncopjfc (m) - MSD cta>fc (m) = ^ 2 X m {R u ) J^i 1 ~ ^ m {R u )) 2 ^ el (E w - A T ^ V A>) e k . (127) 

3=0 

From (1891 ), we have 



lim el (E„ - A Tj E v A j ) e k = a 2 k - el r x s[ T, v s\r\ e k . (128) 
Therefore, there exists an integer J m such that for any e > 0, 

e T k (E„ - A^E^) e fc > <r 2 fc - e^sf E„ Sl rf e fc - e = A (129) 

for all j > J m . From (l90l ), A in (11291 ) becomes A = a 2 k — sjY> v si/N — e. From condition (191! , we 
are able to choose e small enough such that A is strictly greater than zero. Therefore, expression (1127) 
is lower bounded by: 



MSD nc0Pifc (m) - MSD ctajfc (m) > /i 2 A m ( J R n ) 



s + A • J2 (! - i"Am(i?„)) 2j 

where the term z > is an upper bound for the first J rn terms of the summation in (1 1 27b . i.e., 

Jm-i 

]T (1 - ii\ m {R u )) 2l e T k (E„ - A T 'E W ^) e fc 
i=o 



(130) 



< 2 < OO. 



(131) 



It can be verified that the series inside the brackets of <1 1 30b is strictly decreasing in \i G (0, 1/ \ m {R u )). 
In addition, 

lim f (1- ii\ m {Ru)? j I = oo. (132) 
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Thus, there exists a fi° n > such that the sum inside the bracket of (11301 1 becomes positive and, hence, 

MSD nc0Pifc (m) - MSD cta>fc (m) > (133) 

for all < fi < [i° m . Repeating the above argument, we will obtain a collection of step-size bounds 
{/i°,//2> • • • We then choose fi° = min{/^, /x^, . . . , fi° M } so that relation (11331 ) holds for all m. 

Then, applying Lemma [U we arrive at (|92T ) for any fi satisfying < /i < 

Appendix F 

Condition (|81"1) Implies Condition (l9T1 ) when A is Primitive 

It follows from dSB that A Tj T. v A j - A T( ^ j+1 ' ) T, v A j+1 > for any nonnegative integer j and then 

J 

(a t ^ v A3 - A T ^J: v A j+1 ) = S„ - A T ( J+1 >£,,A J+1 > 0. (134) 

i=o 

Since A is primitive, as J tends to infinity, we get from (1891 that 

lim (s„ - A T(J+1) S„A J+lN ) = E„ - nsf E vSl rJ > 0. (135) 

Using (l90l , we conclude that 

det(S„ - r lS J S„airf ) = det(S„) • det ^ - S" 1 ! • s i^ Sl t^ > . (136) 

Since for any column vectors {x, y} of size iV, it holds that det(/7v — % ■ y T ) = 1 — y T ■ x, relation (11361 ) 
implies that the following must hold: 

1 - fl^fil^ . s-Ijl^ > o. (137) 

However, by the Cauchy-Schwarz inequality ll47l and using the fact that sJl/^/N = 1 from (l90l , we 
have 



\l=l / \z=i / \/=i VJV / VVJV 

where si x denotes the Ith entry of si. Therefore, relation (11371 can hold only with equality in (1138I ). In 

turn, equality in (11381 ) holds if, and only if, there exists a constant c such that s^i/y/N = c- o~\ for all 

/. By the fact that sft/y/N = 1, we get: 

SLl 



(139) 



and arrive at (191! since 
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